38 research outputs found

    Storage Efficient Substring Searchable Symmetric Encryption

    Get PDF
    We address the problem of substring searchable encryption. A single user produces a big stream of data and later on wants to learn the positions in the string that some patterns occur. Although current techniques exploit auxiliary data structures to achieve efficient substring search on the server side, the cost at the user side may be prohibitive. We revisit the work of substring searchable encryption in order to reduce the storage cost of auxiliary data structures. Our solution entails a suffix array based index design, which allows optimal storage cost O (n) with small hidden factor at the size of the string n. We analyze the security of the protocol in the real ideal framework. Moreover, we implemented our scheme and the state of the art protocol [7] to demonstrate the performance advantage of our solution with precise benchmark results

    Characterizing the morbid genome of ciliopathies

    Get PDF
    Background Ciliopathies are clinically diverse disorders of the primary cilium. Remarkable progress has been made in understanding the molecular basis of these genetically heterogeneous conditions; however, our knowledge of their morbid genome, pleiotropy, and variable expressivity remains incomplete. Results We applied genomic approaches on a large patient cohort of 371 affected individuals from 265 families, with phenotypes that span the entire ciliopathy spectrum. Likely causal mutations in previously described ciliopathy genes were identified in 85% (225/265) of the families, adding 32 novel alleles. Consistent with a fully penetrant model for these genes, we found no significant difference in their “mutation load” beyond the causal variants between our ciliopathy cohort and a control non-ciliopathy cohort. Genomic analysis of our cohort further identified mutations in a novel morbid gene TXNDC15, encoding a thiol isomerase, based on independent loss of function mutations in individuals with a consistent ciliopathy phenotype (Meckel-Gruber syndrome) and a functional effect of its deficiency on ciliary signaling. Our study also highlighted seven novel candidate genes (TRAPPC3, EXOC3L2, FAM98C, C17orf61, LRRCC1, NEK4, and CELSR2) some of which have established links to ciliogenesis. Finally, we show that the morbid genome of ciliopathies encompasses many founder mutations, the combined carrier frequency of which accounts for a high disease burden in the study population. Conclusions Our study increases our understanding of the morbid genome of ciliopathies. We also provide the strongest evidence, to date, in support of the classical Mendelian inheritance of Bardet-Biedl syndrome and other ciliopathies

    The evolving SARS-CoV-2 epidemic in Africa: Insights from rapidly expanding genomic surveillance

    Get PDF
    INTRODUCTION Investment in Africa over the past year with regard to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequencing has led to a massive increase in the number of sequences, which, to date, exceeds 100,000 sequences generated to track the pandemic on the continent. These sequences have profoundly affected how public health officials in Africa have navigated the COVID-19 pandemic. RATIONALE We demonstrate how the first 100,000 SARS-CoV-2 sequences from Africa have helped monitor the epidemic on the continent, how genomic surveillance expanded over the course of the pandemic, and how we adapted our sequencing methods to deal with an evolving virus. Finally, we also examine how viral lineages have spread across the continent in a phylogeographic framework to gain insights into the underlying temporal and spatial transmission dynamics for several variants of concern (VOCs). RESULTS Our results indicate that the number of countries in Africa that can sequence the virus within their own borders is growing and that this is coupled with a shorter turnaround time from the time of sampling to sequence submission. Ongoing evolution necessitated the continual updating of primer sets, and, as a result, eight primer sets were designed in tandem with viral evolution and used to ensure effective sequencing of the virus. The pandemic unfolded through multiple waves of infection that were each driven by distinct genetic lineages, with B.1-like ancestral strains associated with the first pandemic wave of infections in 2020. Successive waves on the continent were fueled by different VOCs, with Alpha and Beta cocirculating in distinct spatial patterns during the second wave and Delta and Omicron affecting the whole continent during the third and fourth waves, respectively. Phylogeographic reconstruction points toward distinct differences in viral importation and exportation patterns associated with the Alpha, Beta, Delta, and Omicron variants and subvariants, when considering both Africa versus the rest of the world and viral dissemination within the continent. Our epidemiological and phylogenetic inferences therefore underscore the heterogeneous nature of the pandemic on the continent and highlight key insights and challenges, for instance, recognizing the limitations of low testing proportions. We also highlight the early warning capacity that genomic surveillance in Africa has had for the rest of the world with the detection of new lineages and variants, the most recent being the characterization of various Omicron subvariants. CONCLUSION Sustained investment for diagnostics and genomic surveillance in Africa is needed as the virus continues to evolve. This is important not only to help combat SARS-CoV-2 on the continent but also because it can be used as a platform to help address the many emerging and reemerging infectious disease threats in Africa. In particular, capacity building for local sequencing within countries or within the continent should be prioritized because this is generally associated with shorter turnaround times, providing the most benefit to local public health authorities tasked with pandemic response and mitigation and allowing for the fastest reaction to localized outbreaks. These investments are crucial for pandemic preparedness and response and will serve the health of the continent well into the 21st century

    Springer-Verlag, Lecture Notes in Computer Science, 2002 The enhanced suffix array and its applications to genome analysis

    No full text
    Abstract. In large scale applications as computational genome analysis, the space requirement of the suffix tree is a severe drawback. In this paper, we present a uniform framework that enables us to systematically replace every string processing algorithm that is based on a bottomup traversal of a suffix tree by a corresponding algorithm based on an enhanced suffix array (a suffix array enhanced with the lcp-table). In this framework, we will show how maximal, supermaximal, and tandem repeats, as well as maximal unique matches can be efficiently computed. Because enhanced suffix arrays require much less space than suffix trees, very large genomes can now be indexed and analyzed, a task which was not feasible before. Experimental results demonstrate that our programs require not only less space but also much less time than other programs developed for the same tasks.

    Replacing suffix trees with enhanced suffix arrays

    Get PDF
    www.elsevier.com/locate/jda The suffix tree is one of the most important data structures in string processing and comparative genomics. However, the space consumption of the suffix tree is a bottleneck in large scale applications such as genome analysis. In this article, we will overcome this obstacle. We will show how every algorithm that uses a suffix tree as data structure can systematically be replaced with an algorithm that uses an enhanced suffix array and solves the same problem in the same time complexity. The generic name enhanced suffix array stands for data structures consisting of the suffix array and additional tables. Our new algorithms are not only more space efficient than previous ones, but they are also faster and easier to implement

    116 The Median Problem for the Reversal Distance in Circular Bacterial Genomes

    No full text
    Abstract. In the median problem, we are given a distance or dissimilarity measure d, three genomes G1,G2, andG3, and we want to find agenomeG (a median) such that the sum �3 i=1 d(G, Gi) is minimized. The median problem is a special case of the multiple genome rearrangement problem, where one wants to find a phylogenetic tree describing the most “plausible ” rearrangement scenario for multiple species. The median problem is NP-hard for both the breakpoint and the reversal distance [5, 14]. To the best of our knowledge, there is no approach yet that takes biological constraints on genome rearrangements into account. In this paper, we make use of the fact that in circular bacterial genomes the predominant mechanism of rearrangement are inversions that are centered around the origin or the terminus of replication [8, 10, 18]. This constraint simplifies the median problem significantly. More precisely, we show that the median problem for the reversal distance can be solved in linear time for circular bacterial genomes.

    Sequencing and assembly of the Egyptian buffalo genome.

    No full text
    Water buffalo (Bubalus bubalis) is an important source of meat and milk in countries with relatively warm weather. Compared to the cattle genome, a little has been done to reveal its genome structure and genomic traits. This is due to the complications stemming from the large genome size, the complexity of the genome, and the high repetitive content. In this paper, we introduce a high-quality draft assembly of the Egyptian water buffalo genome. The Egyptian breed is used as a dual purpose animal (milk/meat). It is distinguished by its adaptability to the local environment, quality of feed changes, as well as its high resistance to diseases. The genome assembly of the Egyptian water buffalo has been achieved using a reference-based assembly workflow. Our workflow significantly reduced the computational complexity of the assembly process, and improved the assembly quality by integrating different public resources. We also compared our assembly to the currently available draft assemblies of water buffalo breeds. A total of 21,128 genes were identified in the produced assembly. A list of milk virgin-related genes; milk pregnancy-related genes; milk lactation-related genes; milk involution-related genes; and milk mastitis-related genes were identified in the assembly. Our results will significantly contribute to a better understanding of the genetics of the Egyptian water buffalo which will eventually support the ongoing breeding efforts and facilitate the future discovery of genes responsible for complex processes of dairy, meat production and disease resistance among other significant traits
    corecore